Reinforcement learning signals predict future decisions.
نویسندگان
چکیده
Optimal behavior in a competitive world requires the flexibility to adapt decision strategies based on recent outcomes. In the present study, we tested the hypothesis that this flexibility emerges through a reinforcement learning process, in which reward prediction errors are used dynamically to adjust representations of decision options. We recorded event-related brain potentials (ERPs) while subjects played a strategic economic game against a computer opponent to evaluate how neural responses to outcomes related to subsequent decision-making. Analyses of ERP data focused on the feedback-related negativity (FRN), an outcome-locked potential thought to reflect a neural prediction error signal. Consistent with predictions of a computational reinforcement learning model, we found that the magnitude of ERPs after losing to the computer opponent predicted whether subjects would change decision behavior on the subsequent trial. Furthermore, FRNs to decision outcomes were disproportionately larger over the motor cortex contralateral to the response hand that was used to make the decision. These findings provide novel evidence that humans engage a reinforcement learning process to adjust representations of competing decision options.
منابع مشابه
Adaptive Range Coding
This paper examines a class of neuron based learning systems for dynamic control that rely on adaptive range coding of sensor inputs. Sensors are assumed to provide binary coded range vectors that coarsely describe the system state. These vectors are input to neuron-like processing elements. Output decisions generated by these "neurons" in turn affect the system state, subsequently producing ne...
متن کاملDorsal Striatal-midbrain Connectivity in Humans Predicts How Reinforcements Are Used to Guide Decisions
It has been suggested that the target areas of dopaminergic midbrain neurons, the dorsal (DS) and ventral striatum (VS), are differently involved in reinforcement learning especially as actor and critic. Whereas the critic learns to predict rewards, the actor maintains action values to guide future decisions. The different midbrain connections to the DS and the VS seem to play a critical role i...
متن کاملIntrospective Agents: Confidence Measures for General Value Functions
Agents of general intelligence deployed in real-world scenarios must adapt to ever-changing environmental conditions. While such adaptive agents may leverage engineered knowledge, they will require the capacity to construct and evaluate knowledge themselves from their own experience in a bottom-up, constructivist fashion. This position paper builds on the idea of encoding knowledge as temporall...
متن کاملGround Delay Program Analytics with Behavioral Cloning and Inverse Reinforcement Learning
We used historical data to build two types of model that predict Ground Delay Program implementation and also produce insights into how and why those implementation decisions are made. More specifically, we built behavioral cloning and inverse reinforcement learning models that predict hourly Ground Delay Program implementation at Newark Liberty International and San Francisco International air...
متن کاملNeural signature of fictive learning signals in a sequential investment task.
Reinforcement learning models now provide principled guides for a wide range of reward learning experiments in animals and humans. One key learning (error) signal in these models is experiential and reports ongoing temporal differences between expected and experienced reward. However, these same abstract learning models also accommodate the existence of another class of learning signal that tak...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- The Journal of neuroscience : the official journal of the Society for Neuroscience
دوره 27 2 شماره
صفحات -
تاریخ انتشار 2007